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Abstract — Starting from a practical use of Reed-Solomon codes 
in a cryptographic scheme published in Indocrypt'09, this paper 
deals with the threshold of linear g-ary error-correcting codes. 
The security of this scheme is based on the intractability of 
polynomial reconstruction when there is too much noise in 
the vector. Our approach switches from this paradigm to an 
Information Theoretical point of view: is there a class of elements 
that are so far away from the code that the list size is always 
superpolynomial? Or, dually speaking, is Maximum-Likelihood 
decoding almost surely impossible? 

We relate this issue to the decoding threshold of a code, and 
show that when the minimal distance of the code is high enough, 
the threshold effect is very sharp. In a second part, we explicit 
lower-bounds on the threshold of Maximum-Distance Separable 
codes such as Reed-Solomon codes, and compute the threshold 
for the toy example that motivates this study. 

I. Introduction 

In JT], Bringer et al. proposed a low-cost mutual authen- 
tication protocol, that uses a Reed-Solomon code structure. 
This protocol is pretty simple: Bob owns two secret polyno- 
mials P>,P b ' of degree less than k known only by Alice; to 
authenticate herself to Bob, Alice proves the knowledge of 
P, by sending (i,P(,(o!j)) where a, is the z-th element of a 
F q . Bob proves his identity by replying with (P b '(ai)). This 
protocol is made such that if Alice speaks to a lot of persons, 
it is hard to trace Bob out of all the conversations, and it 
is hard to impersonate Alice (or Bob). The security of the 
protocol is based on an algorithmic assumption, saying that the 
polynomial reconstruction problem is hard for the vectors of 
F™ that are far enough from the code. Indeed, the best known 
algorithms solving polynomial reconstruction are those of 
Guruswami-Sudan [2| and, on a related problem, Gurus wami- 
Rudra [3], which can basically reconstruct a polynomial given 
\f~kn correct values. 

This algorithmic security result is somehow unsatisfying, 
for it is possible to exhibit better decoding algorithm. We 
therefore take interest in the information-theoretic aspect of 
such a problem. 

The solution of the problem raised by |1| is to look at the 
output of a list-decoder centered around the received values, 
and to output the possible polynomials as candidate values for 
Pft or P b '. Our approach consists in looking at a usually ignored 
side of list-decoding, which is to look at the radii r such that 
list-decoding a word with radius r provides a list that is always 
lower-bounded by a large enough number. This differs from 



the literature concerning list-decoding, which usually looks 
for radii for which the size is always upper-bounded by a 
maximum list size, or tries to exhibit a counter-example. 

The "large enough" list size can be obtained easily by 
imposing that Maximum-Likelihood Decoding to be most im- 
probable. For that, we focus on the all-or-nothing behaviour of 
the ML decoder. Inspired by percolation theory |4|, and code- 
applied graph theory Q, we will show how it is possible to 
conservatively estimate, before, after, and around a threshold, 
the all-or-nothing probability of ML decoding. 

II. The Threshold of a Code 
The existence of a threshold is motivated by the classical 
question of percolation : given a graph, with a source, and 
a sink, and given the probability p for a "wet" node of the 
graph to "wet" an adjacent node, what is the probability for 
the source to wet the sinkl It appears that this probability has a 
threshold effect; in other words, there exists a limit probability 
p c such that, if p > p c , then the sink is almost surely wet, and 
if p < Pc, then the sink is almost never wet. The threshold 
effect is illustrated in Fig Q] 

This question can be transposed into the probability of 
error-correcting a code. Given a proportion of errors p, with 
a decoding algorithm, what is the probability of correctly 
recovering the sent codeword? It was shown in |6| that 
for every binary code, and every decoding algorithm, this 
probability also follows a threshold. 

In this paper, we show that this property also applies to q- 
ary codes. In the following part, we show that the threshold 
behaviour that was seen on binary codes can be obtained again. 

A. The Margulis-Russo Identity 

The technique used to derive threshold effects in discrete 
spaces is to integrate an isoperimetric inequality; for that, the 
Margulis-Russo identity is required. 

Let H = {0, 1}" be the Hamming space; the Hamming 
distance d(x, y) provides the number of different coordinates 
between vectors x and y. Consider the measure /i p : H — » 
[0,1] defined by /j, p (x) = p wf - x \l - p) n -< x ) where w{x) is 
the Hamming weight of x. The number of limit-vectors of a 
subset A C H is a function defined as Ha{x) — \B(x, 1) fl A\ 
for x 6 A. 

For A C H such that A is increasing {i.e. if x G A, and 
y > x, then y e A with > defined component-wise), Margulis 
and Russo showed : 



— j = - / h A {x)dfip(x) 

dp P J A 

Let q E N, q > 2. This section shows that this equality is 
also true in H q — {0, ...q — 1}". 

For a vector x E H q , the support of x is the set of all its 
non-null coordinates, i.e. supp(x) — {i E {1, . . . ,n} : ^ 

/ \w(x) 

0}. Define the measure function fi p (x) = ( ^zj) (1 — 
p^n-Kj(a;) w j t } 1 = |supp(x)| the weight of x. This 

definition is consistent with a measure, as fi p (H n ) = 

T,xeH q V P (x) = 1- 

Note the inclusion C to be the relation between a set and a 
(general) subset (i.e. for all X, X C X). The support inclusion 
generalises the component-wise < that was used in the binary 
case. 

Lemma 1 (Margulis-Russo Identity over q-ary alphabets): 
Let A be an increasing subset of H q , i.e. such that if y E A, 
for all x E H q such that supp(y) C supp(x), then x E A. 
Then 



dfi p (A) 
dp 



P J A 



h A (x)dfi p {x) 



Proof: The proof of this lemma is an adaptation of 
Margulis' proof in Q. For this, we use the notation: 

. [A, B] = \{x,y} G Ax B : d(x,y) = 1| where A,Bc 
H q , is the number of links from A to B 

• for k G {0, ... , n}, Z k = {x E H q : w(x) = k}, 

• for A c Hq, A k = AO Zk (A is the reunion of the Af.); 

• Dk — J2xeA k ^a(x) is the number of limit-vectors next 
to elements of weight k. 

Trivially, D k = [A k , Z k+1 - A k+1 ] + [A k , Z fe _i - A k -i] + 
[A k , Z k — Ak]. We now note that : 

« [A k , Zk-i] = \A k \k, as to go from A k to Z k -\, the only 
way (in one move) is to put one coordinate to 0; 

• [Ak,Zk+i] = |Afe|(n— k)(q— 1) with the same reasoning; 
. [Ak,Z k - A k ] = [Ak,Z k+ i - Ak+i] = as A is 

increasing. 

• Combining these equalities, we get [Afc,Afc_|_i] = 
\A k \(n-k)(q- 1); 

• [Ak,Zk] = as it is necessary to put a non-null 
coordinate to and a null one to {1, ...q — 1}. 

Finally D k = [A k , Z k _ t ] - [A k ,A k -i] - k\A k \ -(n-k + 
l){q- l)|A fc _i| for k > and D = (or A = H q ). 
Back to the identity desired, we observe that 



h A (x)d^(x) = ^ ^ hAix){ ^- ) k il _ p y 

n , s k 



—fc 



= ^ (fc|A fc | - (n - + l)(q - l)\A k -i\ 



k=l 



k 



(1-P) r 



on the other hand, 



n—k 



= ELo lAkl (^t)" (1 - (| + 

Hence the identity. ■ 
This lemma shows that the Margulis-Russo identity is also 
true on {0...(q — 1)}™; it was the keystone of the reasoning 
done in J5] to show an explicit form of the threshold behaviour 
of Maximum-Likelihood Error Correction. 

B. A Threshold for Error-Decoding q-ary codes 

In the following, we use ip(t) — -J=e~~ the normal distri- 
bution, $(x) = <p(t)dt the accumulate normal function, 
and *(x) = (p($- x (x)) (so that Vx, *(x) • = 1). 

A monotone property is a set A C H q such that A is 
increasing, or A is increasing. 

Theorem 1: Let A be a monotone property of H q . Suppose 
that Vx G A, h A (x) = or h A (x) > A. 

Let E [0, 1] be (the unique real) such that fie {A) — h. Let 

9e (p) = $ (x/2A(V^ln^ - V=Ep)) . 

Then the measure of A, fi p (A) is bounded by : 



H P {A) 
Hp{A) 



< 
> 



ge(p) 
9e(p) 



forpe]0;6»] 
for pE [9;l[ 



Sketch of Proof 
The proof is exactly the same as the one from [5|. The whole 
idea is to derive the upper-range: 



/ V^A' 
J A 



h A dfi p > 



21ni*(/x p (A)) 



The integration of this equation, together with the Margulis- 
Russo lemma, gives the result. ■ 

To conclude this part, we remark that the non-decoding 
region of a given point, for a q-aiy code, is an increasing 
region of F™. For linear codes, this non-decoding region can 
always be translated to that of without loss of generality; let 
A = {x E F - ; 1 s.t. 3c E C,c^0 : d(x,c) < d(x,0)}. The 
probability of error decoding of C is then fj, p (Ao). 

For x G fi p (Ao), we show that either h Ao {x) = 0, or 
h Ao (x) > |. Indeed, if h Ao (x) > 0, then x is nearer to a 
non-null codeword c than to 0. Then all the vectors obtained 
by replacing one of the coordinates of x by are out of Aq; in 
particular, h Ao (x) > d(x,0). Let d c — d(c, 0) be the weight 
of c; as x is nearer to c than to 0, d(x, 0) > %. Thus the 
previous assertion. 

Combining the previous results, we just showed that for any 
g-ary code, the probability of error is, as for binary codes, 
bounded by a threshold function. This can be expressed by 




Figure 1. Illustration of the threshold effect, d = 400, p c = 0.7 



the following theorem, which has the same form as the one 
showed in 0: 

Theorem 2: Let C be a code of any length, and of minimal 
distance d. Over the g-ary symmetric channel, with transition 
probability p, the probability of decoding error P e (p) associ- 
ated with C is such that there exists a unique p c s]0; 1| such 
that P e {p c ) = |, and P e is bounded by: 



Pe( P ) | 1 



*(V^(V-ln(l-p c )- v/-ln(l-p))) 



The upper-bound (<) is true when p £]0;p c ]; the lower-bound 
(>) is true when p £ [p c ; 1[. 

Even though linearity was used not to lose any generality 
previously, it is not a requirement for this theorem. Indeed, the 
bounding equations are true for every codeword c by replacing 
d by min c / e c,c'^c d(c, d). Assuming that the codewords sent 
are distributed in a uniform way over C, we thus obtain this 
result. 

The behaviour of this function is illustrated in Fig[TJ Around 
p sa (actually, for all p < p c — e...), P e is extremely flat 
around 0; around p « 1 (and, symmetrically, for all p > p c + e, 
P e is extremely flat around 1. Finally, around the threshold 
p c , the slope is . — -, which is almost vertical when the 
minimal distance d is large. 



III. Explicit Computation of the Threshold for 
Maximum-Distance Separable Codes 

In this section, we only take interest in linear codes over 

F« 

A. Another Estimation of the Decoding Threshold 

By linearity, we can again without loss of generality assume 
that the sent codeword was the all null vector. It is possible to 
have a rough estimation of the probability of wrongly decoding 
with crossover probability p correctly a vector by computing 
the proportion of vectors x € F™ of weight less or equal to 
np that are closer to a non-null codeword than to 0. Let g(p) 
be this proportion. 



gip) = 



| {x : s.t. 3c G C, c 7^ : d(x, c) ^ w(x) < np} \ 



{x : w(x) < np} | 

Let vol(q,n,t) = ~ log g (\B(t)\), where B(t) is the Ham- 
ming ball of radius t, ( for example, centered on 0) in F™. It is 
well known that when t < vol(q, n, t) = H q (^) +o n (l), 
where H q (x) = -x \og q x- (1-x) log 9 (l —x) + x log q (q- 1) 
is the g-ary entropy of x € [0, 1]. 

To compute the numerator, we suggest, for each codeword 
c E C that has a weight between d and 2pn, to compute the 
number of vectors x that are nearer to c than to 0. This number 
actually only depends on the weight of c, and will be noted 
f pn (w(c)). As there are A w r c \ codewords of weight w(c) in 
the code (with the standard notation), the function g(p) can 
be approximated by: 



S(P) < 



<1 



nvol(q,n,pn) 



(i) 



The different quantities used in this equation are illustrated 
in Fig |2 



\t=pn 



I 2t 



d/2 



Figure 2. Different quantities used in EqfT] 

vt{w) is explicited hereafter. Let c be a codeword of weight 
w. Let x € F™ be a vector with the following constraints: 

• d(x, 0) < t, i.e. x is the result of the transmission of 
with at most t errors. 

• d(x,0) > d(x,c), i.e. x is wrongly decoded. 

We note a the number of coordinates % in x such that Xi ^ C4 
and Xi — 0; f3 is the number of coordinates i such that Xi ^ 
and Xi 7^ 0; 7 is the number of coordinates j such that Xi 7^ cj 
and Cj = 0. 

The previous constraints on x can be rewritten into the 
system (5): 

1) < a,/3 < w 

2) 0<7<n-w 
(S) : { 3) 7 < t + a - w 

4 ) /3 + 7 < i 
5) 2a + ft < w 



We then obtain 



v t {w) 



E 

a,/3,7 



{q-2f 



w 



7 



Remark 1: It is easy to see that v t (w) is at most the volume 
of a ball of radius «;—•=; this estimation will be used in the 
next part. 



B. Application to MDS codes 

Maximum-Distance Separable (MDS) Codes are codes such 
that their dimension k and minimal distance d fulfil the 
Singleton bound, so that: 

k + d = n — 1. 

A well known family of MDS codes are the Reed-Solomon 
codes, for which a codeword is made of the evaluation of a 
degree k — 1 polynomial over n field elements ot\, . . . , a n . 
Reed-Solomon codes over F, can have a length up to q — 1, 
but shorter such codes are also MDS. 

For MDS codes, the number At of codewords of given 
weight is known. This number is: 

^=E(-i)^(;)Q(^-i) 

From this identity, it is easy to derive the more usable 
formula: 

Ai=(f)i:(-i) j Q(<i i+i - d - j -i) (2) 

It is now possible to approximate quite nicely the error 
probability while under the threshold - indeed, the numerator 
and denominator are correct as long as a vector x is not close 
to 2 different codewords with a weight in the range [d;pn], 
i.e. as long as the list of codewords at a distance less than pn 
from x is reduced to a single element. 

C. Short MDS Codes over Large Fields 

We now focus on the specific problem presented in the 
Introduction, and motivated by the beckoning and authenti- 
cation protocol from (l). This setting is characterized by the 
following: 

• The underlying code is a Reed-Solomon over a field ¥ q ; 

• The field size q is very large for cryptographic reasons; 

• The code length n is very short (with respect to q) as nq 
is the size of embedded low-cost devices' memory. 

This application fits into the framework depicted in the 
previous sections. Moreover, the information "n much smaller 
than q" (n = o(q)) enables to compute an asymptotic first 
order estimation of the threshold in such codes. 

Indeed, if g(p) < f(p), then > f^(\)- We now 

compute an upper-bound on g(p), to derive an estimation on 
the threshold 9. More precisely, we aim at computing u(p) 
the first-order value of log, (g(p))', then, t _1 (0) is a lower- 
approximation of the threshold. 

To estimate the weight enumerator Ai, we use formula (|2} 
to derive 

A t < (I - d) W 2 l q l+l - d < n2 n+l q l+l - d . 

The number of targetted vectors for each codeword v t (l) 
is not easy to evaluate; we note its first order development 

log, i^(0 = nfx(l,t) +o,(l), so that v t (l) < q n ^ ■ o q {q). 



(Here, the term o(q) is a bounded by a polynomial in n.) We 
know that 

Q<nn{l,t)<l-1 (3) 

Combining these elements with equation (|T), we obtain 

9{p) < Y^ l pn d o(q)q 1+l ~ d+ ^ l ' pn ^~ nvol ( q,n ' pn \ 

As vol(q,n,t) = H q (±) + o„(l) = £ + o,(l), 
the first order of g(p) is bounded by: log g g(p) < 
max l£ [i p „] (1 + / - d-pn + nfi(l,pn)) + o,(l). 

The bounding (0 of /i shows that the right-hand side of 
this inequality is between 1 +pn — d and 1 + 3pn — 4^, which 
shows that the threshold ,g~ 1 (|) is asymptotically between | 
and 8. 

Unfortunately, a more precise evaluation of u strongly 
depends on the context. Indeed, according to Section UlI-AI 

V(l,t)= 0g ( q) . max q^( n - l )( l B )( a+ /). 

This maximum can be obtained by evaluating the term to 
be maximized on all vertices of the polytope defined by the 
system (S) ((S) is made of 9 inequalities of 3 unknown, the 
vertices are obtained by selecting 3 of these equations, thus at 
most (g) = 84 vertices); however, it is not possible to exhibit 
here a general answer as the solution depends on the minimal 
distance of the code, i.e. on the rate of the Reed-Solomon 
code. 

D. Numerical Application to a (2048, 256, 1793) 2 64 MDS 
Code 

In the case of a code over a finite field of reasonable 
dimension, it is possible to exactly compute the ratio that 
approximates the Maximum Likelihood threshold. However, 
the exact threshold cannot be easily computed yet; it is still 
an open problem related to the list-decoding capacity of Reed- 
Solomon codes. 

We therefore used the NTL open-source library [8] to 
compute the values Ai, vt(l) and \B(t)\ in order to have 
an accurate enough approximation of the the function g(p) 
described earlier. The parameters are those that were proposed 
in HI, and show that the decoding threshold of such a code 
is between 0.8 and 0.875. 

The slope around the threshold is around 115, so for p 
"small" (in fact, a bit smaller than p c ) g(p) is very near to 0, 
while as p goes to 1, g(p) is much greater than the maximum 
probability of 1. This was predicted earlier, and expresses the 
fact that the list-size of radius pn is always greater than 1. 
The threshold value g^{\) ~ t _1 (0) is a lower-bound for 
the threshold of the code, though the intuition says that this 
lower-bound is pretty near to the real threshold. 

IV. Conclusion 

As a conclusion, let us look back to the starting point of 
our reasoning. The initial goal was to revise the conditions of 
security of the construction depicted in (TJ: from a received 
vector x of F™, for what parameters is the size of the 
list of radius pn exponentially large? This problem can be 



reduced to that of the threshold probability of a linear error- 
correcting code. Indeed, below the threshold of the code, when 
the minimal distance of the code is large enough, the error 
decoding probability of the code is exponentially small, and it 
is exponentially close to 1 above the threshold. For our class of 
parameters, ensuring that the error rate is above the threshold 
is enough to show the security of the scheme. 

We then showed that the threshold behaviour can be ex- 
plicited for g-ary codes as well as for binary codes; we then 
explicited a lower-bound on the threshold of MDS codes. 

Applying these results to the initial problem, we show that 
the threshold for a (highly) truncated Reed-Solomon code over 
a finite field F 2 64 is very near to normalized the minimal 
distance d = n — fc + lof this code. As a conclusion, to 
switch from an algorithmic assumption (the hardness of the 
Polynomial Reconstruction Problem, see JU) to Information- 
Theoretical security, we recommend to raise the dimension k 
of the underlying code. This lowers the decoding threshold of 
the code; the downside is that storage of a codeword is more 
costly. 
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